243 research outputs found

    Timing- and power-driven ALU design training using spreadsheet-based arithmetic exploration

    Get PDF
    We describe master-level design training that combines ALU design exercises based on commercial synthesis tools and arithmetic explorations based on spreadsheets. Despite its limited complexity, the ALU has a few important properties that make it suitable for our training; 1) the ALU subcircuits are diverse and contain both short and long timing paths, 2) timing-driven design is called for, since the ALU is a performance bottleneck, and 3) the ALU is continuously used, making power dissipation an important design parameter. After enforcing strict timing constraints during synthesis of the ALU, the students need to reconsider how to implement the arithmetic block, which initially is too slow. Here, performing arithmetic explorations inside an innovative spreadsheet environment helps to visualize circuit implementation tradeoffs. The final phase in the design training focuses on power analysis and demonstrates that the choice of timing constraint impacts power dissipation

    Energy-Efficient High-Throughput Staircase Decoders

    Get PDF
    We introduce staircase decoder implementations achieving up to 1-Tb/s throughput with energy dissipation of 1.2 pJ/information bit. The implementations are estimated to achieve >10.5 dB of net coding gain depending on the configuration

    Exploring early and late ALUs for single-issue in-order pipelines

    Get PDF
    In-order processors are key components in energy-efficient embedded systems. One important design aspect of in-order pipelines is the sequence of pipeline stages: First, the position of the execute stage, in which arithmetic logic unit (ALU) operations and branch prediction are handled, impacts the number of stall cycles that are caused by data dependencies between data memory instructions and their consuming instructions and by address generation instructions that depend on an ALU result. Second, the position of the ALU inside the pipeline impacts the branch penalty. This paper considers the question on how to best make use of ALU resources inside a single-issue in-order pipeline. We begin by analyzing which is the most efficient way of placing a single ALU in an in-order pipeline. We then go on to evaluate what is the most efficient way to make use of two ALUs, one early and one late ALU, which is a technique that has revitalized commercial in-order processors in recent years. Our architectural simulations, which are based on 20 MiBench and 7 SPEC2000 integer benchmarks and a 65-nm postlayout netlist of a complete pipeline, show that utilizing two ALUs in different stages of the pipeline gives better performance and energy efficiency than any other pipeline configuration with a single ALU

    Data-Width-Driven Power Gating of Integer Arithmetic Circuits

    Get PDF
    When performing narrow-width computations, power gating of unused arithmetic circuit portions can significantly reduce leakage power. We deploy coarse-grain power gating in 32-bit integer arithmetic circuits that frequently will operate on narrow-width data. Our contributions include a design framework that automatically implements coarse-grain power-gated arithmetic circuits considering a narrow-width input data mode, and an analysis of the impact of circuit architecture on the efficiency of this data-width-driven power gating scheme. As an example, with a performance penalty of 6.7%, coarse-grain power gating of a 45-nm 32-bit multiplier is demonstrated to yield an 11.6x static leakage energy reduction per 8x8-bit operation

    On Regularity and Integrated DFM Metrics

    Get PDF
    Transistor geometries are well into the nanometer regime, keeping with Moore's Law. With this scaling in geometry, problems not significant in the larger geometries have come to the fore. These problems, collectively termed variability, stem from second-order effects due to the small geometries themselves and engineering limitations in creating the small geometries. The engineering obstacles have a few solutions which are yet to be widely adopted due to cost limitations in deploying them. Addressing and mitigating variability due to second-order effects comes largely under the purview of device engineers and to a smaller extent, design practices. Passive layout measures that ease these manufacturing limitations by regularizing the different layout pitches have been explored in the past. However, the question of the best design practice to combat systematic variations is still open. In this work we explore considerations for the regular layout of the exclusive-OR gate, the half-adder and full-adder cells implemented with varying degrees of regularity. Tradeoffs like complete interconnect unidirectionality, and the inevitable introduction of vias are qualitatively analyzed and some factors affecting the analysis are presented. Finally, results from the Calibre Critical Feature Analysis (CFA) of the cells are used to evaluate the qualitative analysis

    Fiber-on-Chip: Digital FPGA Emulation of Channel Impairments for Real-Time Evaluation of DSP

    Get PDF
    We describe the Fiber-on-Chip (FoC) approach, in which digital models are used for real-time emulation of an optical communication system, to achieve cost-effective and reproducible long-term DSP evaluations inside a single chip

    Energy-Efficient Implementation of Carrier Phase Recovery for Higher-Order Modulation Formats

    Get PDF
    We introduce circuit implementations of one- and two-stage carrier phase recovery (CPR) for 256QAM coherent optical receivers. We describe in detail the optimizations of algorithms, such as modified Viterbi-Viterbi (mVV), blind phase search (BPS), and principal component-based phase estimation (PCPE), that are required to develop energy-efficient CPR circuits and show how design parameter settings and limited fixed-point resolution affect the SNR penalty. 30-GBaud CPR circuit netlists synthesized in a 22-nm CMOS process technology allow us to study trade-offs between energy per bit and SNR penalty. We show that it is possible to reach an energy dissipation of around 1 pJ/bit at an SNR penalty of 0.6 dB for two-stage PCPE+BPS and mVV+BPS implementations, and that PCPE+BPS is the preferred choice thanks to its smaller area

    Fiber-on-Chip: Digital Emulation of Channel Impairments for Real-Time DSP Evaluation

    Get PDF
    We describe the Fiber-on-Chip (FoC) approach to verification of digital signal processing (DSP) circuits, where digital models of a fiber-optic communication system are implemented in the same hardware as the DSP under test. The approach can enable cost-effective long-term DSP evaluations without the need for complex optical-electronic testbeds with high-speed interfaces, shortening verification time and enabling deep bit-error rate evaluations. Our FoC system currently contains a digital model of a transmitter generating a pseudo-random bitstream and a digital model of a channel with additive white Gaussian noise, phase noise and polarization-mode dispersion. In addition, the FoC system contains digital features for real-time control of channel parameters, using low-speed communication interfaces, and for autonomous real-time analysis, which enable us to batch multiple unsupervised emulations on the same hardware. The FoC system can target both field-programmable gate arrays, for fast evaluation of fixed-point logic, and application-specific integrated circuits, for accurate power dissipation measurements

    Fiber-on-Chip: Digital FPGA Emulation of Channel Impairments for Real-Time Evaluation of DSP

    Get PDF
    We describe the Fiber-on-Chip (FoC) approach, in which digital models are used for real-time emulation of an optical communication system, to achieve cost-effective and reproducible long-term DSP evaluations inside a single chip
    • …
    corecore